Adaptive bit-rate adjustment of multimedia communications channels using transport control protocol

ABSTRACT

A videoconferencing apparatus having a corresponding method and instruction program comprises a video codec to generate video data; a video packetizer to produce TCP packets of the video data; an audio codec to generate audio data; an audio packetizer to produce packets of the audio data; a transmit circuit to transmit the TCP packets; a receive circuit to receive a RTCP receiver report representing a number of the TCP packets of the video data received by a receiver; and a controller to generate an estimate of a number of the TCP packets of the video data transmitted but not yet received based on a difference between a number of the TCP packets of the video data transmitted by the transmit circuit and the number of the TCP packets of the video data received by the receiver, and control the video bit rate according to the estimate.

BACKGROUND

The present invention relates generally to bit rate control for digital data transmission. More particularly, the present invention relates to adaptive bit-rate adjustment of multimedia communications channels using transport control protocol (TCP).

Multimedia communications channels such as those used in Internet videoconferencing generally employ the user datagram protocol (UDP) to transport packets of video data. Because UDP does not support the retransmission of lost packets, it is well-suited to real-time data transmission. The delay required for the retransmission of a lost packet in a real-time multimedia communications channel would produce a noticeable fault at the receiver such as frozen video and clicks in the audio.

However, UDP is a connectionless protocol, and so presents a network security issue. Many businesses will not permit UDP connections to traverse their corporate firewalls, and so cannot use UDP videoconferencing systems.

However, another transport protocol is available, namely transmission control protocol (TCP). But TCP retransmits lost packets, and so is generally not well-suited for real-time multimedia communications. TCP also provides network congestion control by effectively changing the bit rate of the communications channel, lowering the bit rate of each channel on a congested network connection to allow all of the channels to share the network connection. This congestion control can adversely affect multimedia communications. For example, if a videoconferencing application is transmitting at a bit rate greater than that permitted by TCP congestion control, a growing transmission lag will result. If the difference in bit rates is 10%, then at the end of a one-hour videoconference the lag will be 6 minutes, hardly real-time.

SUMMARY

In general, in one aspect, the invention features a videoconferencing apparatus comprising a video codec to generate video data at a video bit rate; a video packetizer to produce Transport Control Protocol (TCP) packets of the video data; an audio codec to generate audio data at an audio bit rate; an audio packetizer to produce packets of the audio data; a transmit circuit to transmit the TCP packets of the video data and the packets of the audio data; a receive circuit to receive a Real-time Transport Control Protocol (RTCP) receiver report representing a number of the TCP packets of the video data received by a receiver of the TCP packets of the video data; and a controller to generate an estimate of a number of the TCP packets of the video data transmitted by the transmit circuit but not yet received by the receiver based on a difference between a number of the TCP packets of the video data transmitted by the transmit circuit and the number of the TCP packets of the video data received by the receiver, and control the video bit rate according to the estimate.

In general, in another aspect, the invention features an apparatus and corresponding method and computer program. The apparatus comprises a data generator to generate data at a bit rate; a packetizer to produce Transport Control Protocol (TCP) packets of the data; a transmit circuit to transmit the TCP packets of the data; and a controller to control the bit rate according to an estimate of a number of the TCP packets of the data transmitted by the transmit circuit but not yet received by a receiver of the TCP packets of the data.

Particular implementations can include one or more of the following features. The controller generates the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver based on a number of the TCP packets of the data transmitted by the transmit circuit following an initialization event, a number of the TCP packets of the data received by the receiver following the initialization event, and a previous estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver. To generate the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver, the controller determines a plurality of differences, each at a different time, between a number of the TCP packets of the data transmitted by the transmit circuit following an initialization event and a number of the TCP packets of the data received by the receiver following the initialization event; and generates the estimate according to at least one of the group consisting of a median of the differences, a mean of the differences, and a mode of the differences. To control the bit rate according to the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver, the controller decreases the bit rate when a first predetermined number of consecutive differences between the number of the TCP packets of the data transmitted by the transmit circuit following an initialization event and the number of the TCP packets of the data received by the receiver following the initialization event fall above a threshold that is a first function of the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver; increases the bit rate when a second predetermined number of consecutive differences between the number of the TCP packets of the data transmitted by the transmit circuit following an initialization event and the number of the TCP packets of the data received by the receiver following the initialization event fall below a threshold that is a second function of the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver; and increases the bit rate when the bit rate has not been increased for a predetermined interval. To control the bit rate according to the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver, the controller further generates a new estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver; determines whether the new estimate falls outside an estimate window surrounding the estimate; controls the bit rate based on the new estimate when the new estimate falls outside the estimate window surrounding the estimate; and controls the bit rate based on the estimate when the new estimate falls inside the estimate window surrounding the estimate. The apparatus further comprises a receive circuit to receive a packet of data representing the number of the TCP packets of the data received by the receiver. The packet of data representing the number of the TCP packets of the data received by the receiver comprises a Real-time Transport Control Protocol (RTCP) receiver report packet. The data generator comprises at least one of the groups consisting of a video codec to encode video data; and an audio codec to encode audio data. A videoconferencing system comprises the apparatus.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 shows a videoconferencing system in communication with a network such as the Internet.

FIG. 2 shows an adaptive bit-rate control process for the videoconferencing system of FIG. 1 according to a preferred embodiment of the present invention.

FIG. 3 shows a process for controlling the bit rate of the video data based on an estimate of the number of the TCP video packets currently in transit.

The leading digit(s) of each reference numeral used in this specification indicates the number of the drawing in which the reference numeral first appears.

DETAILED DESCRIPTION

Embodiments of the present invention provide adaptive bit-rate adjustment of multimedia communications channels using transport control protocol (TCP). The techniques disclosed herein are especially useful in real-time two-way applications such as videoconferencing and voice-over-IP telephony, but are also applicable to one-way communications channels, and to communications channels with less stringent latency requirements.

FIG. 1 shows a videoconferencing system 100 in communication with a network 102 such as the Internet. But while embodiments of the present invention are described with respect to network videoconferencing, the techniques disclosed herein are equally applicable to other sorts of one-way and two-way communications applications over networks or direct links.

Videoconferencing system 100 comprises a video source 104 that provides a video signal, for example from a videoconferencing camera, a video codec 106 to encode the video signal as video data, and a video packetizer 108 to produce TCP packets of the video data. Videoconferencing system 100 further comprises an audio source 110 that provides an audio signal, for example from a microphone, an audio codec 112 to encode the audio signal as audio data, and an audio packetizer 114 to produce TCP packets of the audio data. Videoconferencing system 100 further comprises one or more transmit circuits 116 such as Ethernet ports to transmit the video and audio TCP packets, one or more receive circuits 118 to receive data and control packets from network 102, and a controller 120 to control videoconferencing system 100.

FIG. 2 shows an adaptive bit-rate control process 200 for videoconferencing system 100 according to a preferred embodiment of the present invention. While process 200 is described with respect to controlling a video data bit rate, it is equally applicable to controlling an audio data bit rate or bit rates for any sort of digital data generated by data generators such as audio and video sources, codecs, and the like. Preferably controller 120 executes two instances of process 200 concurrently. One instance of process 200 controls the video data bit rate while the other instance controls the audio data bit rate.

During a videoconference, a source, such as video source 104, produces a data signal, e.g., a video signal (step 202). A codec, such as video codec 106, encodes the video to produce video data at a video bit rate (step 204). A packetizer, such as video packetizer 108, produces TCP packets of the video data (step 206). Transmit circuit 116 transmits the TCP packets of video data (step 208).

Controller 120 estimates the number of the TCP packets of video data in transit, that is, the number of the TCP packets of video data transmitted by transmit circuit 116 but not yet received by the receiver of the packets such as another videoconferencing system (step 210). An estimate is used because it is not possible to know the exact number of the TCP video packets currently in transit. Controller 120 controls the bit rate of the video data according to the estimate of the number of the TCP packets of video data in transit (step 212).

FIG. 3 shows a process 300 for controlling the bit rate of the video data based on an estimate of the number of the TCP video packets currently in transit. Controller 120 determines a difference DIFF between a number of the TCP packets of the video data transmitted by transmit circuit 116 and a number of the TCP packets of the video data received by the receiver (step 302), preferably over a predetermined interval.

The number of the TCP packets of the (video) data received by the receiver is obtained from the receiver, preferably as a Real-time Transport Control Protocol (RTCP) receiver report packet sent by the receiver and received by receive circuit 118. The number of the TCP packets of the video data transmitted by videoconferencing system 100 is obtained from videoconferencing system 100. In a preferred embodiment, the RTCP reporting interval is two seconds, and the numbers of packets are counted starting with an initialization event, such as the start of the current videoconferencing session.

Controller 120 also estimates the number D of transmitted packets of the video data that are in transit over network 102 (step 304). Preferably the estimate D is calculated as the median of the previous 50 values of DIFF, although a different number of values of DIFF can be used, and instead of the median, the mean, the mode or some other function of the values of DIFF can be used.

However, upon initialization an insufficient number of values of DIFF are available. Preferably the first value of DIFF is used until 7 values of DIFF have been calculated. Then the median of all of the values of DIFF is used until 50 values of DIFF have been calculated. Thereafter the sliding window of 50 values of DIFF is used, as described above.

If network 102 is slow, the first few estimates of D might be too large, for example when the initial video bit rate is much greater than the average bit rate of network 102. Therefore the initial video bit rate is preferably initially limited based on the size S of the average packet of video data transmitted by videoconferencing system 100. In a preferred embodiment, if the average packet size exceeds K bits, then the bit rate is decreased by K/DS until DS<K, where K=40,000. Of course, other values for K can be used.

Process 300 benefits from the stability of the value of D. Therefore, in a preferred embodiment, when a new value of D is calculated, it is compared to the previous value of D. If the new value of D falls inside an estimate window surrounding the previous value of D, then the new value of D is discarded, and the previous value of D is used. Preferably the estimate window is D±one standard deviation of DIFF. Preferably the standard deviation of DIFF is computed as the median absolute deviation of the previous 50 values of DIFF, although other computation methods can be used.

Process 300 estimates the standard deviation SDev of the packets of video data in transit (step 306). Preferably the standard deviation SDev is computed as the median absolute deviation of the previous 50 values of DIFF, although other computation methods can be used. However, upon initialization an insufficient number of values of DIFF are available. Preferably the standard deviation SDev is computed as the average of the highest and lowest values of DIFF until 7 samples of DIFF have been received, although other computation methods can be used. Thereafter the standard deviation SDev is computed as described above.

Process 300 then controls the bit rate of the video data according to the values of DIFF and D. In particular, process 300 decreases the bit rate when M consecutive values of DIFF fall above a threshold that is a function of D and increases the bit rate when N consecutive values of DIFF fall below a threshold that is a function of D. Multiple thresholds can be used, as described in detail below.

Process 300 maintains a counter I for each threshold. For four thresholds, process 300 maintains counters I1, I2, I3, and I4. Process 300 also preferably maintains a counter I5 to count the number of receiver reports for which no video bit rate adjustments are made.

If a value of DIFF exceeds the sum of the value of D and twice the standard deviation SDev (step 308), then controller 120 increments counter I1 (step 310). If I1=3, meaning DIFF>D+2SDev for three consecutive RTCP receiver reports (step 312), then controller 120 decreases the video bit rate (step 314). Preferably the decrease is 20%, although other values can be used.

After changing the video bit rate, and before making another estimate of the number of TCP packets of video data in transit, process 300 waits for a predetermined interval, preferably by skipping 2 RTCP receiver reports (step 318). Process 300 also resets all of the counters I1, I2, I3, I4, and I5 after changing the video bit rate (step 318). Process 300 then resumes at step 302.

However, if at step 308 DIFF≦D+2SDev, counter I1 is reset to zero (step 320) to ensure that counter I1 counts only consecutive RTCP receiver reports where DIFF>D+2SDev.

If a value of DIFF exceeds the sum of the value of D and the standard deviation SDev (step 322), then controller 120 increments counter I2 (step 324). If I2=5, meaning DIFF>D+SDev for five consecutive RTCP receiver reports (step 326), then controller 120 decreases the video bit rate (step 314), skips 2 RTCP reports (step 316), and resets counters I (step 318). Process 300 then resumes at step 302. Preferably the decrease is 20%, although other values can be used.

However, if at step 322 DIFF≦D+SDev, counter I2 is reset to zero (step 328) to ensure that counter I2 counts only consecutive RTCP receiver reports where DIFF>D+SDev.

If a value of DIFF exceeds the value of D (step 330), then controller 120 increments counter I3 (step 332). If I3=9, meaning DIFF>D for nine consecutive RTCP receiver reports (step 334), then controller 120 decreases the video bit rate (step 314), skips 2 RTCP reports (step 316), and resets counters I (step 318). Process 300 then resumes at step 302. Preferably the decrease is 20%, although other values can be used.

However, if at step 330 DIFF≦D, counter I3 is reset to zero (step 336) to ensure that counter I3 counts only consecutive RTCP receiver reports where DIFF>D.

If a value of DIFF is below the value of D (step 338), then controller 120 increments counter I4 (step 340). If I4=6, meaning DIFF<D for six consecutive RTCP receiver reports (step 342), then controller 120 increases the video bit rate (step 344), skips 2 RTCP reports (step 316), and resets counters I (step 318). Process 300 then resumes at step 302. Preferably the increase is 10%, although other values can be used.

However, if at step 338 DIFF≧D, counter I4 is reset to zero (step 346) to ensure that counter I4 counts only consecutive RTCP receiver reports where DIFF<D.

To ensure that the video bit rate does not stabilize at an unnecessarily low value, if no changes to the video bit rate are made for J consecutive values of DIFF (that is, for J RTCP receiver report packets), then controller 120 increases the video bit rate. Preferably J=16 and the increase is 10%, although other values can be used. Therefore when no video bit rate adjustment is made for a RTCP receiver report, process 300 increments counter I5 (step 348). If I5=16, meaning no video bit rate adjustment has been made for 16 consecutive RTCP receiver reports (step 350), then controller 120 increases the video bit rate (step 344), skips 2 RTCP reports (step 316), and resets counters I (step 318). Preferably the increase is 10%, although other values can be used. Otherwise process 300 resumes with step 302.

Preferably process 300 includes a burst detection routine to handle bursts of video data, for example when an i-frame is to be sent when the video includes significant motion. When such a burst occurs, controller 120 halves the video bit rate, and maintains that value for 3 RTCP receiver report packets before resuming process 300.

The invention can be implemented in digital electronic circuitry, or in hardware, firmware, software, or in combinations of them. An apparatus of the invention can be implemented in a computer program product tangibly embodied in a device-readable medium, e.g., storage device, for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. The invention can be implemented advantageously in one or more programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).

A number of implementations of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other implementations are within the scope of the following claims. 

1. A videoconferencing apparatus comprising: a video codec to generate video data at a video bit rate; a video packetizer to produce Transport Control Protocol (TCP) packets of the video data; an audio codec to generate audio data at an audio bit rate; an audio packetizer to produce packets of the audio data; a transmit circuit to transmit the TCP packets of the video data and the packets of the audio data; a receive circuit to receive a Real-time Transport Control Protocol (RTCP) receiver report representing a number of the TCP packets of the video data received by a receiver of the TCP packets of the video data; and a controller to generate an estimate of a number of the TCP packets of the video data transmitted by the transmit circuit but not yet received by the receiver based on a difference between a number of the TCP packets of the video data transmitted by the transmit circuit and the number of the TCP packets of the video data received by the receiver, and control the video bit rate according to the estimate.
 2. An apparatus comprising: a data generator to generate data at a bit rate; a packetizer to produce Transport Control Protocol (TCP) packets of the data; a transmit circuit to transmit the TCP packets of the data; and a controller to control the bit rate according to an estimate of a number of the TCP packets of the data transmitted by the transmit circuit but not yet received by a receiver of the TCP packets of the data.
 3. The apparatus of claim 2, wherein: the controller generates the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver based on a number of the TCP packets of the data transmitted by the transmit circuit following an initialization event, a number of the TCP packets of the data received by the receiver following the initialization event, and a previous estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver.
 4. The apparatus of claim 3, wherein, to generate the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver, the controller: determines a plurality of differences, each at a different time, between a number of the TCP packets of the data transmitted by the transmit circuit following an initialization event and a number of the TCP packets of the data received by the receiver following the initialization event; and generates the estimate according to at least one of the group consisting of a median of the differences, a mean of the differences, and a mode of the differences.
 5. The apparatus of claim 2, wherein, to control the bit rate according to the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver, the controller: decreases the bit rate when a first predetermined number of consecutive differences between the number of the TCP packets of the data transmitted by the transmit circuit following an initialization event and the number of the TCP packets of the data received by the receiver following the initialization event fall above a threshold that is a first function of the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver; increases the bit rate when a second predetermined number of consecutive differences between the number of the TCP packets of the data transmitted by the transmit circuit following an initialization event and the number of the TCP packets of the data received by the receiver following the initialization event fall below a threshold that is a second function of the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver; and increases the bit rate when the bit rate has not been increased for a predetermined interval.
 6. The apparatus of claim 5, wherein, to control the bit rate according to the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver, the controller further: generates a new estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver; determines whether the new estimate falls outside an estimate window surrounding the estimate; controls the bit rate based on the new estimate when the new estimate falls outside the estimate window surrounding the estimate; and controls the bit rate based on the estimate when the new estimate falls inside the estimate window surrounding the estimate.
 7. The apparatus of claim 2, further comprising: a receive circuit to receive a packet of data representing the number of the TCP packets of the data received by the receiver.
 8. The apparatus of claim 7, wherein the packet of data representing the number of the TCP packets of the data received by the receiver comprises: a Real-time Transport Control Protocol (RTCP) receiver report packet.
 9. The apparatus of claim 2, wherein the data generator comprises at least one of the group consisting of: a video codec to encode video data; and an audio codec to encode audio data.
 10. A videoconferencing system comprising the apparatus of claim
 2. 11. A method comprising: generating data at a bit rate; producing Transport Control Protocol (TCP) packets of the data; transmitting the TCP packets of the data; and controlling the bit rate according to an estimate of a number of the TCP packets of the data transmitted by the transmit circuit but not yet received by a receiver of the TCP packets of the data.
 12. The method of claim 11, further comprising: generating the estimate of the number of the TCP packets of the data transmitted by the transmit circuit but not yet received by the receiver based on a number of the TCP packets of the data transmitted following an initialization event, a number of the TCP packets of the data received by the receiver following the initialization event, and a previous estimate of the number of the TCP packets of the data transmitted but not yet received by the receiver.
 13. The method of claim 12, wherein generating the estimate of the number of the TCP packets of the data transmitted but not yet received by the receiver comprises: determining a plurality of differences, each at a different time, between a number of the TCP packets of the data transmitted following an initialization event and a number of the TCP packets of the data received by the receiver following the initialization event; and generating the estimate according to at least one of the group consisting of a median of the differences, a mean of the differences, and a mode of the differences.
 14. The method of claim 11, wherein controlling the bit rate according to the estimate of the number of the TCP packets of the data transmitted but not yet received by the receiver comprises: decreasing the bit rate when a first predetermined number of consecutive differences between the number of the TCP packets of the data transmitted following an initialization event and the number of the TCP packets of the data received by the receiver following the initialization event fall above a threshold that is a first function of the estimate of the number of the TCP packets of the data transmitted but not yet received by the receiver; increasing the bit rate when a second predetermined number of consecutive differences between the number of the TCP packets of the data transmitted following an initialization event and the number of the TCP packets of the data received by the receiver following the initialization event fall below a threshold that is a second function of the estimate of the number of the TCP packets of the data transmitted but not yet received by the receiver; and increasing the bit rate when the bit rate has not been increased for a predetermined interval.
 15. The method of claim 14, wherein controlling the bit rate according to the estimate of the number of the TCP packets of the data transmitted but not yet received by the receiver of the TCP packets of the data further comprises: generating a new estimate of the number of the TCP packets of the data transmitted but not yet received by the receiver; determining whether the new estimate falls outside an estimate window surrounding the estimate; controlling the bit rate based on the new estimate when the new estimate falls outside the estimate window surrounding the estimate; and controlling the bit rate based on the estimate when the new estimate falls inside the estimate window surrounding the estimate.
 16. The method of claim 11, further comprising: receiving a packet of data representing the number of the TCP packets of the data received by the receiver.
 17. The method of claim 16, wherein the packet of data representing the number of the TCP packets of the data received by the receiver comprises: a Real-time Transport Control Protocol (RTCP) receiver report packet.
 18. The method of claim 11, wherein the data comprises at least one of video data or audio data.
 19. A device-readable medium or waveform containing a program of instructions executable by a device and adapted to perform the method of claim
 11. 20. A device-readable medium or waveform containing a program of instructions executable by a device adapted to perform the method of claim
 12. 